Recently, a study led by the Austrian Institute of Complexity Science (CSH) revealed that despite large language models (LLMs) excelling in various tasks, they show significant shortcomings when tackling advanced history questions. The research team tested three leading models, including OpenAI's GPT-4, Meta's Llama, and Google's Gemini, with disappointing results. Note: The image was generated by AI, and the image is licensed by Midjourney to evaluate these models in history.